166 research outputs found
SampleHST: Efficient On-the-Fly Selection of Distributed Traces
Since only a small number of traces generated from distributed tracing helps in troubleshooting, its storage requirement can be significantly reduced by biasing the selection towards anomalous traces. To aid in this scenario, we propose SampleHST, a novel approach to sample on-the-fly from a stream of traces in an unsupervised manner. SampleHST adjusts the storage quota of normal and anomalous traces depending on the size of its budget. Initially, it utilizes a forest of Half Space Trees (HSTs) for trace scoring. This is based on the distribution of the mass scores across the trees, which characterizes the probability of observing different traces. The mass distribution from HSTs is subsequently used to cluster the traces online leveraging a variant of the mean-shift algorithm. This trace-cluster association eventually drives the sampling decision. We have compared the performance of SampleHST with a recently suggested method using data from a cloud data center and demonstrated that SampleHST improves sampling performance up to by 9.5Ă—
Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
Sequential video understanding, as an emerging video understanding task, has
driven lots of researchers' attention because of its goal-oriented nature. This
paper studies weakly supervised sequential video understanding where the
accurate time-stamp level text-video alignment is not provided. We solve this
task by borrowing ideas from CLIP. Specifically, we use a transformer to
aggregate frame-level features for video representation and use a pre-trained
text encoder to encode the texts corresponding to each action and the whole
video, respectively. To model the correspondence between text and video, we
propose a multiple granularity loss, where the video-paragraph contrastive loss
enforces matching between the whole video and the complete script, and a
fine-grained frame-sentence contrastive loss enforces the matching between each
action and its description. As the frame-sentence correspondence is not
available, we propose to use the fact that video actions happen sequentially in
the temporal domain to generate pseudo frame-sentence correspondence and
supervise the network training with the pseudo labels. Extensive experiments on
video sequence verification and text-to-video matching show that our method
outperforms baselines by a large margin, which validates the effectiveness of
our proposed approach. Code is available at https://github.com/svip-lab/WeakSVRComment: CVPR 2023. Code: https://github.com/svip-lab/WeakSV
SampleHST: Efficient On-the-Fly Selection of Distributed Traces
Since only a small number of traces generated from distributed tracing helps
in troubleshooting, its storage requirement can be significantly reduced by
biasing the selection towards anomalous traces. To aid in this scenario, we
propose SampleHST, a novel approach to sample on-the-fly from a stream of
traces in an unsupervised manner. SampleHST adjusts the storage quota of normal
and anomalous traces depending on the size of its budget. Initially, it
utilizes a forest of Half Space Trees (HSTs) for trace scoring. This is based
on the distribution of the mass scores across the trees, which characterizes
the probability of observing different traces. The mass distribution from HSTs
is subsequently used to cluster the traces online leveraging a variant of the
mean-shift algorithm. This trace-cluster association eventually drives the
sampling decision. We have compared the performance of SampleHST with a
recently suggested method using data from a cloud data center and demonstrated
that SampleHST improves sampling performance up to by 9.5x.Comment: 10 pages, 5 figure
A tricarboxylic acid cycle-based machine learning model to select effective drug targets for the treatment of esophageal squamous cell carcinoma
Background: The tricarboxylic acid cycle (TCA cycle) is an important metabolic pathway and closely related to tumor development. However, its role in the development of esophageal squamous cell carcinoma (ESCC) has not been fully investigated.Methods: The RNA expression profiles of ESCC samples were retrieved from the TCGA database, and the GSE53624 dataset was additionally downloaded from the GEO database as the validation cohort. Furthermore, the single cell sequencing dataset GSE160269 was downloaded. TCA cycle-related genes were obtained from the MSigDB database. A risk score model for ESCC based on the key genes of the TCA cycle was built, and its predictive performance was evaluated. The association of the model with immune infiltration and chemoresistance were analyzed using the TIMER database, the R package “oncoPredict” score, TIDE score and so on. Finally, the role of the key gene CTTN was validated through gene knockdown and functional assays.Results: A total of 38 clusters of 8 cell types were identified using the single-cell sequencing data. The cells were divided into two groups according to the TCA cycle score, and 617 genes were identified that were most likely to influence the TCA cycle. By intersecting 976 key genes of the TCA cycle with the results of WGCNA, 57 genes significantly associated with the TCA cycle were further identified, of which 8 were screened through Cox regression and Lasso regression to construct the risk score model. The risk score was a good predictor of prognosis across subgroups of age, N, M classification and TNM stage. Furthermore, BI-2536, camptothecin and NU7441 were identified as possible drug candidates in the high-risk group. The high-risk score was associated with decreased immune infiltration in ESCC, and the low-risk group had better immunogenicity. In addition, we also evaluated the relationship between risk scores and immunotherapy response rates. Functional assays showed that CTTN may affect the proliferation and invasion of ESCC cells through the EMT pathway.Conclusion: We constructed a predictive model for ESCC based on TCA cycle-associated genes, which achieved good prognostic stratification. The model are likely associated with the regulation of tumor immunity in ESCC
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models
Aligned large language models (LLMs) demonstrate exceptional capabilities in
task-solving, following instructions, and ensuring safety. However, the
continual learning aspect of these aligned LLMs has been largely overlooked.
Existing continual learning benchmarks lack sufficient challenge for leading
aligned LLMs, owing to both their simplicity and the models' potential exposure
during instruction tuning. In this paper, we introduce TRACE, a novel benchmark
designed to evaluate continual learning in LLMs. TRACE consists of 8 distinct
datasets spanning challenging tasks including domain-specific tasks,
multilingual capabilities, code generation, and mathematical reasoning. All
datasets are standardized into a unified format, allowing for effortless
automatic evaluation of LLMs. Our experiments show that after training on
TRACE, aligned LLMs exhibit significant declines in both general ability and
instruction-following capabilities. For example, the accuracy of llama2-chat
13B on gsm8k dataset declined precipitously from 28.8\% to 2\% after training
on our datasets. This highlights the challenge of finding a suitable tradeoff
between achieving performance on specific tasks while preserving the original
prowess of LLMs. Empirical findings suggest that tasks inherently equipped with
reasoning paths contribute significantly to preserving certain capabilities of
LLMs against potential declines. Motivated by this, we introduce the
Reasoning-augmented Continual Learning (RCL) approach. RCL integrates
task-specific cues with meta-rationales, effectively reducing catastrophic
forgetting in LLMs while expediting convergence on novel tasks
- …